Maximal strength measurement: A critical evaluation of common methods—a narrative review

Measuring maximal strength (MSt) is a very common performance diagnoses, especially in elite and competitive sports. The most popular procedure in test batteries is to test the one repetition maximum (1RM). Since testing maximum dynamic strength is very time consuming, it often suggested to use isometric testing conditions instead. This suggestion is based on the assumption that the high Pearson correlation coefficients of r ≥ 0.7 between isometric and dynamic conditions indicate that both tests would provide similar measures of MSt. However, calculating r provides information about the relationship between two parameters, but does not provide any statement about the agreement or concordance of two testing procedures. Hence, to assess replaceability, the concordance correlation coefficient (ρc) and the Bland-Altman analysis including the mean absolute error (MAE) and the mean absolute percentage error (MAPE) seem to be more appropriate. Therefore, an exemplary model based on r = 0.55 showed ρc = 0.53, A MAE of 413.58 N and a MAPE = 23.6% with a range of −1,000–800 N within 95% Confidence interval (95%CI), while r = 0.7 and 0.92 showed ρc = 0.68 with a MAE = 304.51N/MAPE = 17.4% with a range of −750 N–600 N within a 95% CI and ρc = 0.9 with a MAE = 139.99/MAPE = 7.1% with a range of −200–450 N within a 95% CI, respectively. This model illustrates the limited validity of correlation coefficients to evaluate the replaceability of two testing procedures. Interpretation and classification of ρc, MAE and MAPE seem to depend on expected changes of the measured parameter. A MAPE of about 17% between two testing procedures can be assumed to be intolerably high.

Especially in elite sports and competitive sports, the effectiveness of training routines is evaluated with performance testing (36) to monitor training progress (24) and, if necessary, adapt training routines. Various methods have been utilized to measure maximal strength using a maximal voluntary contraction (MVC), among which isometric (MVIC) or dynamic testing procedures using the one repetition maximum (1RM) are most common (24,27,37).
Advocates of isometric testing highlight the supposed advantages regarding the quantification of various force-time characteristics (1,(38)(39)(40), a simple and time efficient administration (1) and a good standardization of test conditions with high test-retest reliability (36, 41). Furthermore, isometric strength tests are considered highly sensitive to changes in strength while possessing minimal coordination requirements, minimal injury risk, and being supposedly less fatiguing than 1RM test protocols (27,36,42).
Additionally, measuring force-time characteristics like rate of force development (RFD) or isometric impulse is considered to provide information on various dynamic strength qualities (24, 43). However, dynamic strength measurements using the one repetition maximum (1RM) is stated as the most popular strength assessment method, because no expensive equipment such as a force plates or strain gauges are required (44). Since good reliability can be assumed (Seo et al., 2012), there are a substantial number of studies investigating the 1RM in bench press, back squat or the clean (18, 45-61). To date, there is conflicting evidence about the external validity of various strength testing methods (62), especially when considering the associations between isometric and dynamic performances (43, [63][64][65][66]. Training and testing specificity (i.e., testing should involve tasks similar to the task or type of training) has been a hallmark of sport and exercise science (67).
Accordingly, a potentially higher transfer of dynamic testing measures towards speed strength performances like sprinting, jumping, rapidly performed directional changes (e.g., agility) seems rational (63,64,68). Additionally, 1RM testing provides comparable reliability to isometric testing, with a higher validity to estimate maximal strength capacity (36).

Problem
Still, based on the supposed advantages of isometric testing conditions and the additional information on force-time characteristics testing has led multiple authors to suggest substituting 1RM testing with isometric testing to monitor athletes' training progress (1,24) McGuigan et al. (69) state; "Given that the test seems to indicate to a large extent the dynamic performance characteristics of athletes, it may not be necessary to perform 1RM testing on a large number of exercises". While isometric tests appear to provide valuable information on forcetime characteristics, the replaceability of dynamic testing conditions through isometric testing is primarily justified by "high" Pearson's correlation coefficients (r) or intraclass correlation coefficients (ICC). McGuigan et al. (1) also proposed that "Strength and conditioning coaches and other practitioners with access to a force plate can consider using the isometric mid-thigh pull test as a potential alternative to traditional 1RM testing. In recreationally trained subjects, it appears to correlate extremely well with both the 1RM squat and bench press". Therefore, a high number of studies were found (see Table 1) highlighting the correlation between isometric and dynamic testing conditions. However, to validly claim the potential substitution of dynamic testing conditions for isometric testing within performance diagnostic protocols, a high concordance between methods must be assumed. But none of the studies in the literature calculated concordance correlation coefficients between isometric and dynamic measurements to verify whether one measurement can actually reproduce the results of the other. This is especially of high importance if the replacement of 1RM bench press testing by isometric mid-thigh pull is suggested (69), which seems to be of questionable validity. Hence, the primary aim of this study is to assess the validity of replacing 1RM with isometric testing by comparing Pearson and concordance correlation coefficients. Moreover, to provide more detailed information the mean absolute error (MAE) and mean absolute percentage error (MAPE) will be provided to detect differences between isometric and dynamic testing.

Critical evaluation of commonly performed concordance determination
Investigating the concordance between two measurement devices is a well-known problem in medicine (85)(86)(87)(88). In the literature, just stating correlation coefficients seems to be insufficient, since agreement and correlations are two different concepts (89). Since they can be assumed to be conceptually different, using the method to calculate correlations must be considered inappropriate or inadequate to investigate agreement "Agreement is a concept that is closely related to, but fundamentally different from and often confused with correlation." (89). To investigate the agreement of measurements, a high reproducibility is required. The assumption that both measurement devices measure the same parameter needs to be validated with the deviation between the two devices determined to estimate the concordance or lack of concordance. Pearson correlation coefficients only describe the relationship between two parameters but do not provide any information about the agreement between two testing conditions (85,88). The concordance correlation coefficient can be used instead, assuming a 45°line crossing the origin of the coordinative system and determining the concordance to the regression line of the Pearson correlation (88, [90][91][92]. Furthermore, assuming two testing methods would measure the same parameter, there should be very little variance between them. To illustrate the level of variance between two testing conditions, Bland-Altman Analysis is recommended (85-87, 89, 90, 93). Since the Bland-Altman Plot can be used only for qualitative and visual analysis of variance, the MAE and MAPE are used for quantitative calculation error between both testing conditions. The MAE is stated as a measuring of errors between paired observations evaluating the same parameter (94,95), while the MAPE (96-98) can be seen as an expression of accuracy, providing quantitative information about the deviation between two measuring techniques. Therefore, both parameters can be stated to investigate the difference between a measured and predicted parameter and were further used to validate testing batteries (99,100). In other research fields, such as pharmacology and medicine, using the concordance correlation Frontiers in Sports and Active Living coefficient and Bland-Altman analysis is very common to evaluate the accuracy, validity, and reliability of blood pressure or heart rate devices (91, 92, 101). Since there are no common classifications in high, moderate, and low concordance as found with Pearson correlations (e.g., r = 0.2-0.5 small, r > 0.5-0.7 moderate, r > 0.7 high correlation) (102), it is suggested to classify those effects dependent on content. Assuming moderate increases in maximal strength of for example, 10%-12% within six weeks of training (103), 13.3% within 10 months in elite soccer player (U19) (35) or 12 ± 2%-19 ± 2% in elite crosscountry skiers following 12 weeks of strength training (104), the possibility of replacing 1RM testing with isometric testing requires a high concordance with very little variance, rived from the MAPE. If two tests have a concordance variation of 6% but the training-induced change was 12%-13%, there would be an approximate 50% difference in the strength estimate between the two measures, which would not provide acceptable sensitivity or validity. Accordingly, Dominguez-Jiménez et al. (105) described poor concordance in blood sample measuring devices with concordance correlation coefficients of 0.68-0.8, suggesting that the border for poor agreement with concordance correlation coefficients would be <0.9 (106). Consequently, assuming high correlation coefficient, the calculation of the concordance correlation coefficient and Bland-Altman analysis including MAPE, and MAE were carried out assuming r = 0.9 and r = 0.7. Data were compiled and added from previous investigations. The MAE was determined using, n = number of participants, x i = the isometric strength value, and y i = dynamic strength value and the mean absolute percentage error (MAPE) using Results of this exemplary calculation shows that correlations stated as high (r ≥ 0.7), which are partially higher than the stated correlations in literature with r = 0.52-0.97 (7,48,69,79,80,107) seem not to be sufficient to evaluate the replaceability of dynamic testing conditions with isometric testing. Expecting increases between 10%-19% with a strength training program of 6-24 weeks, a MAPE between isometric and dynamic testing of 7%-17% seems to be intolerably high, considering scientific quality criteria (see Table 2). Therefore, both measurement techniques seem to be reliable and valid to estimate specific metrics of maximal strength capacity, however, it must be assumed that they estimate the maximal strength capacity in different ways, providing different results. The rationale to replace 1RM tests with isometric testing conditions (69, 69) must therefore be rejected. The Bland-Altman analysis in Figure 1 showing a variation of values from −200-450 N for r = 0.92, in Figure 2 with −750 N-600 N for r = 0.7 and −1,000-800 N for r = 0.55 within the 95% CI, underpin the assumption of substantially different strength value estimates by isometric vs. dynamic testing conditions. Although Pearson correlation coefficients and ICC values examine the relationship between two parameters, using these common correlation classifications to examine the replaceability of two measurements must be described as a misinterpretation of statistics and should be avoided. In accordance with Cohen (1988), classification of effect sizes should be considered in the light of content. Accordingly, stated substantial higher borders made by Cataldi et al. (106) considering a cutoff of poor (<0.90), moderate (0.90-0.95), substantial (0.95-0.99), and almost perfect (>0.99) seems more    To conclude, if the expected change in maximal strength due to the intervention is smaller than the mean (percentage) error between two testing methods, it cannot be assumed that both testing method measure the same parameter and a replacement of one of both testing methods should be avoided. Therefore, the objective of an investigation and the chosen procedure to evaluate strength capacity should be selected carefully, as it can be assumed that 1RM and isometric testing conditions will not measure the same parameter. There are some hypotheses explaining differences in strength values dependent on measurement procedure.
Explanation of high variation of correlation coefficients and limited agreement between isometric and dynamic testing procedure From a physical and mechanical point of view, force is defined as F = m * a, so force capability can be described as the ability of the body to accelerate a mass. Maximal strength in a dynamic strength measurement is only assumed to be maximal if the gravitational force acting on the resistance and the force output exerted on the resistance by the individual are equal, so no movement or acceleration of the mass (resistance) would be present. With isometric force measurements, Newton's third principle [for every action (force) in nature there is an equal and opposite reaction] is used to measure the opposing force to an insurmountable resistance. Since performing a one repetition maximum involves moving a surmountable resistance through a range of motion, it is not the same as assessing maximal strength/MVC, leading to the assumption that the 1RM performance would be lower than MVIC. Furthermore, when performing a 1RM, once the initial An exemplary dataset to calculate the concordance correlation coefficient (CCC) of 0.9 with n = 273 showing high Pearson correlation with r = 0.92, representing magnitude of correlation usually found between dynamic and isometric testing.

Angle specificity in isometric testing conditions
In science and diagnostics, there are high demands on standardization to ensure equal testing conditions and exclude external factors influencing the results. Angle specificity of maximal isometric strength produces different maximal strength values when performed at different joint angles (44, 62, 71,111,112). Angle specific differences in maximal strength were reported for the squat (44, 113-116), bench press (65), plantar flexion (115) and deadlift/mid-thigh pull (81). It seems that strength capacity using isometric squat and leg press increases with increasing knee joint (44, 113-116). Examples are stated in Table 3.
Using different joint angles to standardize movements is of questionable validity as similar levels of flexibility and anthropometrics would be assumed. Participants lacking flexibility could reach maximal muscle length in a smaller joint angle compared to flexibility trained participants. Consequently, An exemplary dataset to calculate the concordance correlation coefficient (CCC) of 0.68 with n = 273 showing high Pearson correlation with r = 0.7, representing magnitude of correlation usually found between dynamic and isometric testing. Frontiers in Sports and Active Living assuming a muscle length-maximal strength relationship with highest strength capacity in the "mid-range of motion" (126), standardization in joint angles lead to differences in starting muscle length, if participants demonstrate heterogeneity in flexibility. Furthermore, there are angle dependent differences in EMG-activity, contributing to differences in strength performance (127). Obviously, joint angle dependency for MVIC values leads to a joint angle dependency for correlations between isometric and dynamic MVC testing. Accordingly, Bazyler et al. (107) reported an angle specific high correlation (r = 0.79-0.86) in maximal isometric strength with 1RM in the squat indicating that "these findings demonstrate a degree of joint angle specificity to dynamic tasks for rapid and peak isometric force production" (107). However, obviously, there are numerous other factors influencing the force output. The aforementioned difficulties with standardization of testing range of motion and angles with indivuduals with varying levels of flexibility would be problematic not only for closed chain activities (e.g., squat, cleans, deadlifts) but also open kinetic chain exercises such as found with machines for knee extension (quadriceps) and flexion (hamstrings) or elbow flexion (biceps brachii) and extension (triceps brachii). Furthermore, compared to these uni-articular (e.g, knee extensions, bicep curls and others) resistance exercises, the complexity, coordination, balance, and stability associated with multi-joint movements will influence the force production leading to higher standardization problems.

Familiarization with testing conditions
Another possible explanation is a lack of familiarization to isometric testing conditions (23, 76) due to structural, neural, and biomechanical differences within isometric and dynamic testing conditions associated with the distinct movement patterns and contraction modes (63,64,128). Accordingly, Baker et al. (43) suggested that isometric and dynamic muscle actions must be understood as different physiological phenomenon as motor unit recruitment and rate coding (firing frequency) may differ between both contraction forms. Authors pointed out that three familiarization sessions or a large number of trials (129) were required to get a high stability and reliability for peak force measurement. Palmer et al. (130) reported the relatively high coefficients of variation of 6.6%-19.4% for isometric squat strength were dependent on the knee angle. These high coefficients of variation may be the result of learning to contract under isometric conditions. Unfamiliar testing conditions can influence test quality criteria, consequently, reliability of isometric testing is not always reached (131). Since it can be assumed that most athletes are familiar with dynamic conditions because of daily use in training context, it can be hypothesized that for most athletes there is habituation regarding unfamiliar testing conditions. Lum et al.  (133)(134). Therefore, it could be hypothesized that the type of contraction used in daily training routines would influence the force output in isometric and dynamic testing, and therefore the resulting correlations between both contraction types. However, the sports-dependency regarding the force output of isometric vs. dynamic testing conditions requires further research.

Relevance for the testing practice
Several factors influencing the estimation of maximal strength can lead to significant errors dependent on testing conditions in cross sectional study designs. Since high specificity in training regimes can be assumed (67) a question arises about the impact on results of longitudinal testing designs. Accordingly, using isometric testing conditions, Yahata et al. (115) showed significant increases in MVIC using an extended muscle length in response to long-term stretch training. As it can be assumed that the training routine took place with longer muscle length, training adaptations and strength changes were also specific to training conditions. However, comparing isometric and dynamic testing conditions, significant differences in response to training stimuli would be expected. Warneke et al., (135) showed significant increases in strength capacity under isometric as well as dynamic conditions using six weeks of daily stretch training in the calf muscles. However, under isometric testing conditions there was a significant increase of 16.8%, while 1RM testing showed significant increases of about 25.1%. Furthermore, in 1RM testing a significant contralateral force transfer was present (+11.4%), which was not significant under isometric conditions (+1.4%). Wirth (136) investigated the effects of different weekly training frequencies on maximal dynamic and isometric maximal strength with the biceps brachii muscles. While dynamic testing conditions showed significant increases in 5 of 6 training groups, only one group showed significant increases in MVIC.
Consequently, if Yahata and colleagues (115) would test MVIC exclusively using small joint angles or Wirth (136) tested only MVIC, both studies would underestimate effects of the training routine because of inappropriate testing conditions. Furthermore, Warneke et al. (135) were not able to show a significant contralateral force transfer using daily stretch training, if following the advice to replace 1RM testing by isometric testing. Therefore, the different tests should not be replaceable, but supplement one testing condition with the other. Thus, both testing conditions only estimate MSt capacity, since in both procedures, limitations avoid a "real" maximal force output. Therefore, it is strongly recommended to keep in mind high specificity of testing and training conditions considering the physiological background of each when figuring out the research hypothesis and the following testing protocol. Warneke et al. 10.3389/fspor.2023.1105201 Frontiers in Sports and Active Living

Conclusion
The use of correlation coefficients to justify the replaceability of 1RM testing with isometric testing seems invalid, since the MAPE and MAE between both measurement procedures is intolerably high, even when high correlation coefficients with high sample sizes were used. Investigating the agreement between two measurement conditions requires further analytic approaches, such as concordance-and Bland-Altman analyses with classification of MAPE and MAE values. Investigations considering adequate analyses are very rare in exercise science. Results showing that both 1RM and MVIC present a different estimation of the maximal strength capacity of the participant. Therefore, assuming there are equivalent measures between dynamic performance and isometric testing conditions (24, 84) should be questioned. This estimation can be assumed to be influenced by very different factors such as tested muscle lengths in isometric testing, complexity of the movement in dynamic testing as well as familiarization with the testing conditions considering the type of contractions used in daily training practice.

Practical applications
Using maximal strength tests in practice-performance diagnostics in sports or pre-post-test designs in scientific studiesauthors should consider limitations which should be minimized. Since a higher transfer of 1RM to sport specific movements can be assumed and most athletes using dynamic movements in their daily training routines, a higher application of dynamic testing protocols can be hypothesized in field tests (64, 109) "From this, it could be recommended to use dynamic strength testing and avoid isometric strength testing, if the athletes training routine includes only low level of isometric contractions, and vice versa." (109). However, under laboratory conditions and dependent on the research questions, isometric procedures can also be useful, especially because of time-saving aspects. Whether, and to what extent isometric testing conditions can considered safe might depend on the tested movement. Safety benefits of the isometric squat, pushing the spine against an unyielding resistance may be questionable, while in other movements such as the plantar flexion, the isometric measurement seems to be a safe testing condition. High test specificity (often involves dynamic testing) and relevant physiological issues (often necessitates isometric testing) should be included in the testing design to answer research questions adequately. To avoid missing potential training effects, authors and coaches should be aware of the underlying physiological mechanisms of their training to determine target-oriented testing programs, otherwise there are too many parameters (e.g., different joint angles) to consider, if all possible movement executions should be tested.

Author contributions
KW1 performed the analytic calculations and took the lead in writing the manuscript with support from CMW and MK, KW1, MH and KW2 conceived the main conceptual ideas in consultation with DB and SS. KW2 and SW supervised the statistical analysis and provided critical feedback to the design of the study and the statistical analysis. All authors discussed the results and contributed to the final version of the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. Frontiers in Sports and Active Living 09 frontiersin.org